{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OsXAs2gcIpbC"
      },
      "outputs": [],
      "source": [
        "# Copyright 2024 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5e_7VOHBer8D"
      },
      "source": [
        "# Evaluate a Translation Model\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        " <table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fevaluation%2Fevaltask_approach%2Fevaluate_translation.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaltask_approach/evaluate_translation.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e_1wphBter8E"
      },
      "source": [
        "| | |\n",
        "|-|-|\n",
        "|Author(s) | [Caleb Mbakwe](https://github.com/caleboau2012) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AtUbwvxier8E"
      },
      "source": [
        "## Overview\n",
        "\n",
        "In this tutorial, you will learn how to use the Vertex AI Python SDK for [Gen AI Evaluation Service](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-overview) to measure the translation quality of your LLM responses using [BLEU](https://en.wikipedia.org/wiki/BLEU), [MetricX](https://github.com/google-research/metricx) and [COMET](https://unbabel.github.io/COMET/html/index.html).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CCkzFPrEer8F"
      },
      "source": [
        "## Getting Started"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N0Wchgzqer8F"
      },
      "source": [
        "### Install Vertex AI Python SDK for Gen AI Evaluation Service"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "T-Cgoq37er8F"
      },
      "outputs": [],
      "source": [
        "%pip install --upgrade --user --quiet google-cloud-aiplatform[evaluation]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lOzpVrBLer8F"
      },
      "source": [
        "### Restart runtime\n",
        "To use the newly installed packages in this Jupyter runtime, you might need to restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
        "\n",
        "The restart might take a minute or longer. After it's restarted, continue to the next step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5i8OM_85er8F"
      },
      "outputs": [],
      "source": [
        "# import IPython\n",
        "\n",
        "# app = IPython.Application.instance()\n",
        "# app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "F8NVZ7z6er8G"
      },
      "source": [
        "<div class=\"alert alert-block alert-warning\">\n",
        "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>\n",
        "</div>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eiYxfbBJer8G"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MgUI8ziHer8G"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ka4wP_ljer8G"
      },
      "source": [
        "### Set Google Cloud project information and initialize Vertex AI SDK"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "N-i6OtB9er8G"
      },
      "outputs": [],
      "source": [
        "PROJECT_ID = \"[your-project-id]\"  # @param {type:\"string\"}\n",
        "LOCATION = \"us-central1\"  # @param {type:\"string\"}\n",
        "EXPERIMENT_NAME = \"my-eval-task-experiment\"  # @param {type:\"string\"}\n",
        "\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    raise ValueError(\"Please set your PROJECT_ID\")\n",
        "\n",
        "\n",
        "import vertexai\n",
        "\n",
        "vertexai.init(project=PROJECT_ID, location=LOCATION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dvhI92xhQTzk"
      },
      "source": [
        "### Import libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qP4ihOCkEBje"
      },
      "outputs": [],
      "source": [
        "# General\n",
        "import pandas as pd\n",
        "\n",
        "# Main\n",
        "from vertexai import evaluation\n",
        "from vertexai.evaluation.metrics import pointwise_metric"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gT_OJBHfCg4Q"
      },
      "outputs": [],
      "source": [
        "# @title Helper functions\n",
        "from IPython.display import Markdown, display\n",
        "\n",
        "\n",
        "def display_eval_result(eval_result, metrics=None, model_name=None, rows=0):\n",
        "    if model_name is not None:\n",
        "        display(Markdown(\"## Eval Result for %s\" % model_name))\n",
        "\n",
        "    \"\"\"Display the evaluation results.\"\"\"\n",
        "    summary_metrics, metrics_table = (\n",
        "        eval_result.summary_metrics,\n",
        "        eval_result.metrics_table,\n",
        "    )\n",
        "\n",
        "    metrics_df = pd.DataFrame.from_dict(summary_metrics, orient=\"index\").T\n",
        "    if metrics:\n",
        "        metrics_df = metrics_df.filter(\n",
        "            [\n",
        "                metric\n",
        "                for metric in metrics_df.columns\n",
        "                if any(selected_metric in metric for selected_metric in metrics)\n",
        "            ]\n",
        "        )\n",
        "        metrics_table = metrics_table.filter(\n",
        "            [\n",
        "                metric\n",
        "                for metric in metrics_table.columns\n",
        "                if any(selected_metric in metric for selected_metric in metrics)\n",
        "            ]\n",
        "        )\n",
        "\n",
        "    # Display the summary metrics\n",
        "    display(Markdown(\"### Summary Metrics\"))\n",
        "    display(metrics_df)\n",
        "    if rows > 0:\n",
        "        # Display samples from the metrics table\n",
        "        display(Markdown(\"### Row-based Metrics\"))\n",
        "        display(metrics_table.head(rows))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jo7ahGbnnYfp"
      },
      "source": [
        "# Set up eval metrics for your data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aG3kUfTmoAwb"
      },
      "source": [
        "You can evaluate the translation quality of your data generated from an LLM using:\n",
        "- [BLEU](https://en.wikipedia.org/wiki/BLEU)\n",
        "- [COMET](https://unbabel.github.io/COMET/html/index.html)\n",
        "- [MetricX](https://github.com/google-research/metricx)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EGe_vlUvPVOM"
      },
      "outputs": [],
      "source": [
        "metrics = [\n",
        "    \"bleu\",\n",
        "    # See https://github.com/googleapis/python-aiplatform/blob/4e332de345ef3cc4d5f99f11d6499a3334e3345f/vertexai/evaluation/metrics/pointwise_metric.py#L82 for options.\n",
        "    pointwise_metric.Comet(version=\"COMET_22_SRC_REF\"),  # Reference based COMET\n",
        "    # See https://github.com/googleapis/python-aiplatform/blob/4e332de345ef3cc4d5f99f11d6499a3334e3345f/vertexai/evaluation/metrics/pointwise_metric.py#L115 for options.\n",
        "    pointwise_metric.MetricX(version=\"METRICX_24_SRC\"),  # Reference free MetricX.\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xITUrMB_nbsg"
      },
      "source": [
        "# Prepare your dataset\n",
        "\n",
        "Evaluate stored generative AI model responses in an evaluation dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XGY40wjrQWOc"
      },
      "outputs": [],
      "source": [
        "sources = [\n",
        "    \"Dem Feuer konnte Einhalt geboten werden\",\n",
        "    \"Schulen und Kindergärten wurden eröffnet.\",\n",
        "]\n",
        "\n",
        "responses = [\n",
        "    \"The fire could be stopped\",\n",
        "    \"Schools and kindergartens were open\",\n",
        "]\n",
        "\n",
        "references = [\n",
        "    \"They were able to control the fire.\",\n",
        "    \"Schools and kindergartens opened\",\n",
        "]\n",
        "\n",
        "eval_dataset = pd.DataFrame(\n",
        "    {\n",
        "        \"source\": sources,\n",
        "        \"response\": responses,\n",
        "        \"reference\": references,\n",
        "    }\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZJg5FdWfnhjz"
      },
      "source": [
        "# Run evaluation\n",
        "\n",
        "With the evaluation dataset and metrics defined, you can run evaluation for an `EvalTask` on different models and applications, and many other use cases."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rOvo-LpsQTIj"
      },
      "outputs": [],
      "source": [
        "eval_task = evaluation.EvalTask(\n",
        "    dataset=eval_dataset, metrics=metrics, experiment=EXPERIMENT_NAME\n",
        ")\n",
        "eval_result = eval_task.evaluate()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GwnZMsXBnjS7"
      },
      "source": [
        "You can view the summary metrics and row-based metrics for each response in the `EvalResult`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2QOFq9YZROPr"
      },
      "outputs": [],
      "source": [
        "display_eval_result(eval_result, rows=2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BLMKgL2OnCyt"
      },
      "source": [
        "# Clean up\n",
        "\n",
        "Delete ExperimentRun created by the evaluation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Dv8drshKnEf2"
      },
      "outputs": [],
      "source": [
        "from google.cloud import aiplatform\n",
        "\n",
        "aiplatform.ExperimentRun(\n",
        "    run_name=eval_result.metadata[\"experiment_run\"],\n",
        "    experiment=eval_result.metadata[\"experiment\"],\n",
        ").delete()"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "evaluate_translation.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
